Dataset statistics
| Number of variables | 15 |
|---|---|
| Number of observations | 163994 |
| Missing cells | 6186 |
| Missing cells (%) | 0.3% |
| Duplicate rows | 6 |
| Duplicate rows (%) | < 0.1% |
| Total size in memory | 18.8 MiB |
| Average record size in memory | 120.0 B |
Variable types
| NUM | 9 |
|---|---|
| CAT | 6 |
Reproduction
| Analysis started | 2021-03-22 13:47:55.167736 |
|---|---|
| Analysis finished | 2021-03-22 13:48:59.367501 |
| Duration | 1 minute and 4.2 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
| Dataset has 6 (< 0.1%) duplicate rows | Duplicates |
Employment_Years has 5811 (3.5%) missing values | Missing |
Annual_Income is highly skewed (γ1 = 35.49006707) | Skewed |
Delinquent_2yr has 139459 (85.0%) zeros | Zeros |
Loan_Amount
Real number (ℝ≥0)
| Distinct count | 1274 |
|---|---|
| Unique (%) | 0.8% |
| Missing | 7 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 13074.169141456336 |
|---|---|
| Minimum | 500.0 |
| Maximum | 35000.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 500 |
|---|---|
| 5-th percentile | 3000 |
| Q1 | 7000 |
| median | 11325 |
| Q3 | 18000 |
| 95-th percentile | 30000 |
| Maximum | 35000 |
| Range | 34500 |
| Interquartile range (IQR) | 11000 |
Descriptive statistics
| Standard deviation | 7993.556189 |
|---|---|
| Coefficient of variation (CV) | 0.6114007018 |
| Kurtosis | 0.2289414268 |
| Mean | 13074.16914 |
| Median Absolute Deviation (MAD) | 5125 |
| Skewness | 0.87534248 |
| Sum | 2143993775 |
| Variance | 63896940.54 |
| Value | Count | Frequency (%) | |
| 10000 | 11795 | 7.2% | |
| 12000 | 9164 | 5.6% | |
| 15000 | 7598 | 4.6% | |
| 20000 | 6864 | 4.2% | |
| 8000 | 5860 | 3.6% | |
| 6000 | 5830 | 3.6% | |
| 5000 | 5582 | 3.4% | |
| 35000 | 4478 | 2.7% | |
| 16000 | 4024 | 2.5% | |
| 18000 | 3668 | 2.2% | |
| Other values (1264) | 99124 | 60.4% |
| Value | Count | Frequency (%) | |
| 500 | 11 | < 0.1% | |
| 550 | 1 | < 0.1% | |
| 600 | 6 | < 0.1% | |
| 700 | 3 | < 0.1% | |
| 725 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 35000 | 4478 | 2.7% | |
| 34975 | 6 | < 0.1% | |
| 34900 | 2 | < 0.1% | |
| 34875 | 2 | < 0.1% | |
| 34850 | 2 | < 0.1% |
Term
Categorical
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 7 |
| Missing (%) | < 0.1% |
| Memory size | 1.3 MiB |
| 36 | |
|---|---|
| 60 |
| Value | Count | Frequency (%) | |
| 36 | 129950 | 79.2% | |
| 60 | 34037 | 20.8% | |
| (Missing) | 7 | < 0.1% |
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 3.999957316 |
| Min length | 3 |
Interest_Rate
Real number (ℝ≥0)
| Distinct count | 512 |
|---|---|
| Unique (%) | 0.3% |
| Missing | 7 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 13.715904065566173 |
|---|---|
| Minimum | 5.42 |
| Maximum | 26.06 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 5.42 |
|---|---|
| 5-th percentile | 6.62 |
| Q1 | 10.65 |
| median | 13.49 |
| Q3 | 16.32 |
| 95-th percentile | 21.67 |
| Maximum | 26.06 |
| Range | 20.64 |
| Interquartile range (IQR) | 5.67 |
Descriptive statistics
| Standard deviation | 4.391939871 |
|---|---|
| Coefficient of variation (CV) | 0.3202078295 |
| Kurtosis | -0.3207745705 |
| Mean | 13.71590407 |
| Median Absolute Deviation (MAD) | 2.84 |
| Skewness | 0.3278648857 |
| Sum | 2249229.96 |
| Variance | 19.28913583 |
| Value | Count | Frequency (%) | |
| 12.12 | 5182 | 3.2% | |
| 13.11 | 4627 | 2.8% | |
| 7.9 | 4589 | 2.8% | |
| 8.9 | 4516 | 2.8% | |
| 15.31 | 3494 | 2.1% | |
| 16.29 | 3458 | 2.1% | |
| 10.99 | 3454 | 2.1% | |
| 14.33 | 3321 | 2.0% | |
| 6.03 | 3267 | 2.0% | |
| 11.14 | 3252 | 2.0% | |
| Other values (502) | 124827 | 76.1% |
| Value | Count | Frequency (%) | |
| 5.42 | 573 | 0.3% | |
| 5.79 | 405 | 0.2% | |
| 5.93 | 11 | < 0.1% | |
| 5.99 | 347 | 0.2% | |
| 6 | 34 | < 0.1% |
| Value | Count | Frequency (%) | |
| 26.06 | 70 | < 0.1% | |
| 25.99 | 79 | < 0.1% | |
| 25.89 | 117 | 0.1% | |
| 25.83 | 151 | 0.1% | |
| 25.8 | 206 | 0.1% |
| Distinct count | 11 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 5811 |
| Missing (%) | 3.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.729389378125336 |
|---|---|
| Minimum | 0.5 |
| Maximum | 10.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 0.5 |
|---|---|
| 5-th percentile | 0.5 |
| Q1 | 2 |
| median | 6 |
| Q3 | 10 |
| 95-th percentile | 10 |
| Maximum | 10 |
| Range | 9.5 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 3.541944849 |
|---|---|
| Coefficient of variation (CV) | 0.618206342 |
| Kurtosis | -1.518433619 |
| Mean | 5.729389378 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | -0.05799236246 |
| Sum | 906292 |
| Variance | 12.54537332 |
| Value | Count | Frequency (%) | |
| 10 | 47183 | 28.8% | |
| 2 | 15766 | 9.6% | |
| 0.5 | 14248 | 8.7% | |
| 3 | 13611 | 8.3% | |
| 5 | 12347 | 7.5% | |
| 1 | 11414 | 7.0% | |
| 4 | 11024 | 6.7% | |
| 6 | 10000 | 6.1% | |
| 7 | 9079 | 5.5% | |
| 8 | 7424 | 4.5% |
| Value | Count | Frequency (%) | |
| 0.5 | 14248 | 8.7% | |
| 1 | 11414 | 7.0% | |
| 2 | 15766 | 9.6% | |
| 3 | 13611 | 8.3% | |
| 4 | 11024 | 6.7% |
| Value | Count | Frequency (%) | |
| 10 | 47183 | 28.8% | |
| 9 | 6087 | 3.7% | |
| 8 | 7424 | 4.5% | |
| 7 | 9079 | 5.5% | |
| 6 | 10000 | 6.1% |
Home_Ownership
Categorical
| Distinct count | 6 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 7 |
| Missing (%) | < 0.1% |
| Memory size | 1.3 MiB |
| MORTGAGE | |
|---|---|
| RENT | |
| OWN | 13560 |
| OTHER | 156 |
| NONE | 30 |
| Value | Count | Frequency (%) | |
| MORTGAGE | 79714 | 48.6% | |
| RENT | 70526 | 43.0% | |
| OWN | 13560 | 8.3% | |
| OTHER | 156 | 0.1% | |
| NONE | 30 | < 0.1% | |
| ANY | 1 | < 0.1% | |
| (Missing) | 7 | < 0.1% |
Length
| Max length | 8 |
|---|---|
| Median length | 4 |
| Mean length | 5.862531556 |
| Min length | 3 |
| Distinct count | 14112 |
|---|---|
| Unique (%) | 8.6% |
| Missing | 11 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 71915.670519749 |
|---|---|
| Minimum | 1896.0 |
| Maximum | 7141778.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 1896 |
|---|---|
| 5-th percentile | 27000 |
| Q1 | 45000 |
| median | 61000 |
| Q3 | 85000 |
| 95-th percentile | 145000 |
| Maximum | 7141778 |
| Range | 7139882 |
| Interquartile range (IQR) | 40000 |
Descriptive statistics
| Standard deviation | 59070.91565 |
|---|---|
| Coefficient of variation (CV) | 0.8213914329 |
| Kurtosis | 3267.743467 |
| Mean | 71915.67052 |
| Median Absolute Deviation (MAD) | 19217 |
| Skewness | 35.49006707 |
| Sum | 1.17929474e+10 |
| Variance | 3489373076 |
| Value | Count | Frequency (%) | |
| 60000 | 6325 | 3.9% | |
| 50000 | 5379 | 3.3% | |
| 65000 | 4415 | 2.7% | |
| 40000 | 4393 | 2.7% | |
| 45000 | 4208 | 2.6% | |
| 70000 | 4172 | 2.5% | |
| 75000 | 3940 | 2.4% | |
| 80000 | 3832 | 2.3% | |
| 55000 | 3729 | 2.3% | |
| 90000 | 2916 | 1.8% | |
| Other values (14102) | 120674 | 73.6% |
| Value | Count | Frequency (%) | |
| 1896 | 1 | < 0.1% | |
| 2000 | 1 | < 0.1% | |
| 3000 | 1 | < 0.1% | |
| 3300 | 1 | < 0.1% | |
| 3500 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 7141778 | 1 | < 0.1% | |
| 6100000 | 1 | < 0.1% | |
| 6000000 | 1 | < 0.1% | |
| 5000000 | 1 | < 0.1% | |
| 4900000 | 1 | < 0.1% |
Verification_Status
Categorical
| Distinct count | 3 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 7 |
| Missing (%) | < 0.1% |
| Memory size | 1.3 MiB |
| VERIFIED - income | |
|---|---|
| not verified | |
| VERIFIED - income source |
| Value | Count | Frequency (%) | |
| VERIFIED - income | 60875 | 37.1% | |
| not verified | 59155 | 36.1% | |
| VERIFIED - income source | 43957 | 26.8% | |
| (Missing) | 7 | < 0.1% |
Length
| Max length | 24 |
|---|---|
| Median length | 17 |
| Mean length | 17.07211239 |
| Min length | 3 |
Loan_Purpose
Categorical
| Distinct count | 14 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 7 |
| Missing (%) | < 0.1% |
| Memory size | 1.3 MiB |
| debt_consolidation | |
|---|---|
| credit_card | |
| other | 10492 |
| home_improvement | 9872 |
| major_purchase | 4686 |
| Other values (9) | 14884 |
| Value | Count | Frequency (%) | |
| debt_consolidation | 93261 | 56.9% | |
| credit_card | 30792 | 18.8% | |
| other | 10492 | 6.4% | |
| home_improvement | 9872 | 6.0% | |
| major_purchase | 4686 | 2.9% | |
| small_business | 3841 | 2.3% | |
| car | 2842 | 1.7% | |
| medical | 2029 | 1.2% | |
| wedding | 1751 | 1.1% | |
| moving | 1464 | 0.9% | |
| Other values (4) | 2957 | 1.8% |
Length
| Max length | 18 |
|---|---|
| Median length | 18 |
| Mean length | 14.71852629 |
| Min length | 3 |
State
Categorical
| Distinct count | 50 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 7 |
| Missing (%) | < 0.1% |
| Memory size | 1.3 MiB |
| CA | |
|---|---|
| NY | 14285 |
| TX | 12128 |
| FL | 11396 |
| NJ | 6457 |
| Other values (45) |
| Value | Count | Frequency (%) | |
| CA | 28702 | 17.5% | |
| NY | 14285 | 8.7% | |
| TX | 12128 | 7.4% | |
| FL | 11396 | 6.9% | |
| NJ | 6457 | 3.9% | |
| IL | 6099 | 3.7% | |
| PA | 5427 | 3.3% | |
| VA | 5282 | 3.2% | |
| GA | 5189 | 3.2% | |
| OH | 4896 | 3.0% | |
| Other values (40) | 64126 | 39.1% |
Length
| Max length | 3 |
|---|---|
| Median length | 2 |
| Mean length | 2.000042684 |
| Min length | 2 |
Debt_to_Income
Real number (ℝ≥0)
| Distinct count | 3735 |
|---|---|
| Unique (%) | 2.3% |
| Missing | 7 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.881530121290105 |
|---|---|
| Minimum | 0.0 |
| Maximum | 39.99 |
| Zeros | 270 |
| Zeros (%) | 0.2% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3.79 |
| Q1 | 10.23 |
| median | 15.62 |
| Q3 | 21.26 |
| 95-th percentile | 29.02 |
| Maximum | 39.99 |
| Range | 39.99 |
| Interquartile range (IQR) | 11.03 |
Descriptive statistics
| Standard deviation | 7.587668224 |
|---|---|
| Coefficient of variation (CV) | 0.4777668251 |
| Kurtosis | -0.523370475 |
| Mean | 15.88153012 |
| Median Absolute Deviation (MAD) | 5.51 |
| Skewness | 0.1821600668 |
| Sum | 2604364.48 |
| Variance | 57.57270908 |
| Value | Count | Frequency (%) | |
| 0 | 270 | 0.2% | |
| 16.8 | 152 | 0.1% | |
| 14.4 | 141 | 0.1% | |
| 19.2 | 140 | 0.1% | |
| 18 | 135 | 0.1% | |
| 12 | 134 | 0.1% | |
| 15.6 | 129 | 0.1% | |
| 13.2 | 127 | 0.1% | |
| 20.4 | 126 | 0.1% | |
| 21.6 | 123 | 0.1% | |
| Other values (3725) | 162510 | 99.1% |
| Value | Count | Frequency (%) | |
| 0 | 270 | 0.2% | |
| 0.01 | 6 | < 0.1% | |
| 0.02 | 8 | < 0.1% | |
| 0.03 | 3 | < 0.1% | |
| 0.04 | 5 | < 0.1% |
| Value | Count | Frequency (%) | |
| 39.99 | 1 | < 0.1% | |
| 39.93 | 1 | < 0.1% | |
| 39.88 | 2 | < 0.1% | |
| 39.85 | 1 | < 0.1% | |
| 39.84 | 2 | < 0.1% |
| Distinct count | 19 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 36 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.2273570060625282 |
|---|---|
| Minimum | 0.0 |
| Maximum | 29.0 |
| Zeros | 139459 |
| Zeros (%) | 85.0% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1 |
| Maximum | 29 |
| Range | 29 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.6941679229 |
|---|---|
| Coefficient of variation (CV) | 3.05320665 |
| Kurtosis | 72.66758384 |
| Mean | 0.2273570061 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.960591595 |
| Sum | 37277 |
| Variance | 0.4818691052 |
| Value | Count | Frequency (%) | |
| 0 | 139459 | 85.0% | |
| 1 | 17158 | 10.5% | |
| 2 | 4635 | 2.8% | |
| 3 | 1488 | 0.9% | |
| 4 | 579 | 0.4% | |
| 5 | 310 | 0.2% | |
| 6 | 144 | 0.1% | |
| 7 | 68 | < 0.1% | |
| 8 | 42 | < 0.1% | |
| 9 | 26 | < 0.1% | |
| Other values (9) | 49 | < 0.1% | |
| (Missing) | 36 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 139459 | 85.0% | |
| 1 | 17158 | 10.5% | |
| 2 | 4635 | 2.8% | |
| 3 | 1488 | 0.9% | |
| 4 | 579 | 0.4% |
| Value | Count | Frequency (%) | |
| 29 | 1 | < 0.1% | |
| 18 | 3 | < 0.1% | |
| 16 | 1 | < 0.1% | |
| 15 | 2 | < 0.1% | |
| 14 | 4 | < 0.1% |
Revolving_Cr_Util
Real number (ℝ≥0)
| Distinct count | 1170 |
|---|---|
| Unique (%) | 0.7% |
| Missing | 200 |
| Missing (%) | 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 54.07917280242256 |
|---|---|
| Minimum | 0.0 |
| Maximum | 150.7 |
| Zeros | 1562 |
| Zeros (%) | 1.0% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 8.7 |
| Q1 | 35.6 |
| median | 55.8 |
| Q3 | 74.2 |
| 95-th percentile | 92.5 |
| Maximum | 150.7 |
| Range | 150.7 |
| Interquartile range (IQR) | 38.6 |
Descriptive statistics
| Standard deviation | 25.28536677 |
|---|---|
| Coefficient of variation (CV) | 0.4675620106 |
| Kurtosis | -0.8046268733 |
| Mean | 54.0791728 |
| Median Absolute Deviation (MAD) | 19.2 |
| Skewness | -0.2489493863 |
| Sum | 8857844.03 |
| Variance | 639.3497725 |
| Value | Count | Frequency (%) | |
| 0 | 1562 | 1.0% | |
| 63 | 271 | 0.2% | |
| 62 | 271 | 0.2% | |
| 53 | 270 | 0.2% | |
| 58 | 266 | 0.2% | |
| 70.1 | 263 | 0.2% | |
| 61.3 | 261 | 0.2% | |
| 70.8 | 259 | 0.2% | |
| 65 | 257 | 0.2% | |
| 57 | 257 | 0.2% | |
| Other values (1160) | 159857 | 97.5% |
| Value | Count | Frequency (%) | |
| 0 | 1562 | 1.0% | |
| 0.01 | 1 | < 0.1% | |
| 0.03 | 1 | < 0.1% | |
| 0.04 | 1 | < 0.1% | |
| 0.05 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 150.7 | 1 | < 0.1% | |
| 129.4 | 1 | < 0.1% | |
| 128.1 | 1 | < 0.1% | |
| 120.2 | 1 | < 0.1% | |
| 119 | 1 | < 0.1% |
Total_Accounts
Real number (ℝ≥0)
| Distinct count | 96 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 36 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 24.57973383427463 |
|---|---|
| Minimum | 1.0 |
| Maximum | 118.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 8 |
| Q1 | 16 |
| median | 23 |
| Q3 | 31 |
| 95-th percentile | 46 |
| Maximum | 118 |
| Range | 117 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 11.68519037 |
|---|---|
| Coefficient of variation (CV) | 0.4753993857 |
| Kurtosis | 0.6264591488 |
| Mean | 24.57973383 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 0.7671960425 |
| Sum | 4030044 |
| Variance | 136.5436739 |
| Value | Count | Frequency (%) | |
| 20 | 5970 | 3.6% | |
| 21 | 5947 | 3.6% | |
| 22 | 5824 | 3.6% | |
| 23 | 5816 | 3.5% | |
| 17 | 5800 | 3.5% | |
| 19 | 5745 | 3.5% | |
| 18 | 5724 | 3.5% | |
| 24 | 5569 | 3.4% | |
| 16 | 5487 | 3.3% | |
| 25 | 5410 | 3.3% | |
| Other values (86) | 106666 | 65.0% |
| Value | Count | Frequency (%) | |
| 1 | 21 | < 0.1% | |
| 2 | 49 | < 0.1% | |
| 3 | 323 | 0.2% | |
| 4 | 782 | 0.5% | |
| 5 | 1134 | 0.7% |
| Value | Count | Frequency (%) | |
| 118 | 1 | < 0.1% | |
| 102 | 1 | < 0.1% | |
| 99 | 2 | < 0.1% | |
| 95 | 1 | < 0.1% | |
| 94 | 1 | < 0.1% |
Bad_Loan
Categorical
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.3 MiB |
| GOOD | |
|---|---|
| BAD |
| Value | Count | Frequency (%) | |
| GOOD | 133978 | 81.7% | |
| BAD | 30016 | 18.3% |
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 3.816968913 |
| Min length | 3 |
Longest_Credit_Length
Real number (ℝ≥0)
| Distinct count | 63 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 36 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14.854273655448347 |
|---|---|
| Minimum | 0.0 |
| Maximum | 65.0 |
| Zeros | 11 |
| Zeros (%) | < 0.1% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 10 |
| median | 14 |
| Q3 | 18 |
| 95-th percentile | 28 |
| Maximum | 65 |
| Range | 65 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 6.947732923 |
|---|---|
| Coefficient of variation (CV) | 0.4677261968 |
| Kurtosis | 1.961383604 |
| Mean | 14.85427366 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 1.131950456 |
| Sum | 2435477 |
| Variance | 48.27099276 |
| Value | Count | Frequency (%) | |
| 12 | 12756 | 7.8% | |
| 13 | 12540 | 7.6% | |
| 11 | 11889 | 7.2% | |
| 14 | 11287 | 6.9% | |
| 15 | 9778 | 6.0% | |
| 10 | 9649 | 5.9% | |
| 16 | 8417 | 5.1% | |
| 17 | 7719 | 4.7% | |
| 9 | 7649 | 4.7% | |
| 8 | 7032 | 4.3% | |
| Other values (53) | 65242 | 39.8% |
| Value | Count | Frequency (%) | |
| 0 | 11 | < 0.1% | |
| 1 | 67 | < 0.1% | |
| 2 | 100 | 0.1% | |
| 3 | 914 | 0.6% | |
| 4 | 2477 | 1.5% |
| Value | Count | Frequency (%) | |
| 65 | 1 | < 0.1% | |
| 61 | 2 | < 0.1% | |
| 60 | 2 | < 0.1% | |
| 59 | 1 | < 0.1% | |
| 58 | 3 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| Loan_Amount | Term | Interest_Rate | Employment_Years | Home_Ownership | Annual_Income | Verification_Status | Loan_Purpose | State | Debt_to_Income | Delinquent_2yr | Revolving_Cr_Util | Total_Accounts | Bad_Loan | Longest_Credit_Length | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 5000.0 | 36.0 | 10.65 | 10.0 | RENT | 24000.0 | VERIFIED - income | credit_card | AZ | 27.65 | 0.0 | 83.7 | 9.0 | GOOD | 26.0 |
| 1 | 2500.0 | 60.0 | 15.27 | 0.5 | RENT | 30000.0 | VERIFIED - income source | car | GA | 1.00 | 0.0 | 9.4 | 4.0 | BAD | 12.0 |
| 2 | 2400.0 | 36.0 | 15.96 | 10.0 | RENT | 12252.0 | not verified | small_business | IL | 8.72 | 0.0 | 98.5 | 10.0 | GOOD | 10.0 |
| 3 | 10000.0 | 36.0 | 13.49 | 10.0 | RENT | 49200.0 | VERIFIED - income source | other | CA | 20.00 | 0.0 | 21.0 | 37.0 | GOOD | 15.0 |
| 4 | 5000.0 | 36.0 | 7.90 | 3.0 | RENT | 36000.0 | VERIFIED - income source | wedding | AZ | 11.20 | 0.0 | 28.3 | 12.0 | GOOD | 7.0 |
| 5 | 3000.0 | 36.0 | 18.64 | 9.0 | RENT | 48000.0 | VERIFIED - income source | car | CA | 5.35 | 0.0 | 87.5 | 4.0 | GOOD | 4.0 |
| 6 | 5600.0 | 60.0 | 21.28 | 4.0 | OWN | 40000.0 | VERIFIED - income source | small_business | CA | 5.55 | 0.0 | 32.6 | 13.0 | BAD | 7.0 |
| 7 | 5375.0 | 60.0 | 12.69 | 0.5 | RENT | 15000.0 | VERIFIED - income | other | TX | 18.08 | 0.0 | 36.5 | 3.0 | BAD | 7.0 |
| 8 | 6500.0 | 60.0 | 14.65 | 5.0 | OWN | 72000.0 | not verified | debt_consolidation | AZ | 16.12 | 0.0 | 20.6 | 23.0 | GOOD | 13.0 |
| 9 | 12000.0 | 36.0 | 12.69 | 10.0 | OWN | 75000.0 | VERIFIED - income source | debt_consolidation | CA | 10.78 | 0.0 | 67.1 | 34.0 | GOOD | 22.0 |
Last rows
| Loan_Amount | Term | Interest_Rate | Employment_Years | Home_Ownership | Annual_Income | Verification_Status | Loan_Purpose | State | Debt_to_Income | Delinquent_2yr | Revolving_Cr_Util | Total_Accounts | Bad_Loan | Longest_Credit_Length | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 163984 | 11975.0 | 36.0 | 22.99 | 2.0 | RENT | 40000.0 | VERIFIED - income | small_business | TX | 25.32 | 0.0 | 8.6 | 14.0 | GOOD | 6.0 |
| 163985 | 4000.0 | 36.0 | 6.99 | 10.0 | MORTGAGE | 58000.0 | VERIFIED - income source | debt_consolidation | WI | 1.03 | 0.0 | 0.5 | 18.0 | GOOD | 19.0 |
| 163986 | 2000.0 | 36.0 | 8.19 | 8.0 | MORTGAGE | 31000.0 | VERIFIED - income source | credit_card | CA | 8.45 | 0.0 | 16.9 | 48.0 | GOOD | 11.0 |
| 163987 | 7000.0 | 36.0 | 13.66 | 10.0 | RENT | 48681.0 | not verified | debt_consolidation | NY | 10.85 | 1.0 | 58.1 | 41.0 | GOOD | 20.0 |
| 163988 | 26500.0 | 36.0 | 23.99 | 8.0 | MORTGAGE | 170000.0 | VERIFIED - income source | small_business | NJ | 5.89 | 0.0 | 23.6 | 9.0 | GOOD | 6.0 |
| 163989 | 15000.0 | 60.0 | 12.39 | 3.0 | MORTGAGE | 45000.0 | not verified | credit_card | OK | 31.44 | 4.0 | 75.8 | 34.0 | GOOD | 20.0 |
| 163990 | 20000.0 | 36.0 | 14.99 | 10.0 | OWN | 80000.0 | VERIFIED - income | home_improvement | VA | 23.65 | 0.0 | 68.8 | 18.0 | GOOD | 22.0 |
| 163991 | 12825.0 | 36.0 | 17.14 | 6.0 | MORTGAGE | 38000.0 | not verified | debt_consolidation | TX | 9.03 | 0.0 | 70.7 | 24.0 | GOOD | 9.0 |
| 163992 | 27650.0 | 60.0 | 21.99 | 0.5 | RENT | 60000.0 | VERIFIED - income source | credit_card | NY | 10.10 | 1.0 | 61.2 | 20.0 | GOOD | 6.0 |
| 163993 | 17000.0 | 60.0 | 15.99 | 10.0 | MORTGAGE | 63078.0 | VERIFIED - income source | debt_consolidation | PA | 31.70 | 0.0 | 54.0 | 28.0 | GOOD | 16.0 |